Group By Operations in Pandas

pandas

dataframe

group-by

aggregation

Master Pandas groupby operations: splitting data by categories, applying functions, and combining results. Learn aggregation, transformation, and filtering techniques for data analysis.

Author

Mohammed Adil Siraju

Published

September 21, 2025

GroupBy operations are one of the most powerful features in Pandas for data analysis. They allow you to:

Split data into groups based on criteria
Apply functions to each group independently
Combine the results back into a DataFrame

This notebook covers essential groupby techniques including aggregation functions, multiple aggregations, and advanced operations.

1. Setting Up Sample Data

Let’s create a sample dataset to demonstrate groupby operations. We’ll work with categorical data and numerical values.

import pandas as pd

data = {
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Value': [10,15,20,25,30]
}

df = pd.DataFrame(data)

2. Basic Aggregation Functions

Groupby operations allow you to calculate summary statistics for each group. Here are the most common aggregation functions:

Sum Aggregation

Calculate the total sum of values for each category:

df.groupby('Category').sum()

	Value
Category
A	60
B	40

Mean Aggregation

Calculate the average value for each category:

df.groupby('Category').mean()

	Value
Category
A	20.0
B	20.0

Median Aggregation

Calculate the median (middle) value for each category:

df.groupby('Category').median()

	Value
Category
A	20.0
B	20.0

Maximum Values

Find the highest value in each category:

df.groupby('Category').max()

	Value
Category
A	30
B	25

Minimum Values

Find the lowest value in each category:

df.groupby('Category').min()

	Value
Category
A	10
B	15

Standard Deviation

Measure the spread of values within each category:

df.groupby('Category').std()

	Value
Category
A	10.000000
B	7.071068

Variance

Calculate the variance (squared standard deviation) for each category:

df.groupby('Category').var()

	Value
Category
A	100.0
B	50.0

3. Multiple Aggregations

You can apply multiple aggregation functions at once using the agg() method. This provides a comprehensive view of your grouped data.

Applying Multiple Functions

Calculate sum, mean, and maximum for each category in one operation:

df.groupby('Category').agg(['sum', 'mean', 'max'])

	Value
	sum	mean	max
Category
A	60	20.0	30
B	40	20.0	25

Summary

GroupBy operations are essential for data analysis in Pandas. In this notebook, you learned:

🔢 Basic Aggregation Functions

sum(): Total values per group
mean(): Average values per group
median(): Middle value per group
max() / min(): Highest/lowest values per group
std() / var(): Measure spread within groups

📊 Advanced Operations

agg(): Apply multiple functions simultaneously
Combine statistics for comprehensive group analysis

💡 Key Concepts

Split-Apply-Combine: The three-step process of groupby operations
Aggregation: Reducing groups to single values (sum, mean, etc.)
Multiple Functions: Use agg() for comprehensive summaries

🚀 Best Practices

Choose appropriate aggregation functions for your data type
Use multiple aggregations to get complete group insights
Consider data distribution when selecting measures (mean vs median)

📈 Next Steps

Explore groupby with multiple columns
Learn filtering and transformation operations
Practice with real datasets for business insights

Mastering groupby operations will significantly enhance your data analysis capabilities! 🎯